STR Explorer - workflow for genome-wide association studies based on short tandem repeats
Microsatellite sequences are short tandem repeats(STRs) of two to six nucleotides. They are known to be hypervariable due to the accumulation of length mutations by intra-allelic polymerase slippage on microsatellite sequence during replication.
Current evidence suggests that STRs play an important role in cancer and other diseases. In colorectal cancer(CRC) variation in STR regions have been shown to influence protein-expression levels and increase tumor progression (Contente et al. 2002). The instability of microsatellite regions (MSI marker) is used by clinicians today to classify CRC tumors into different groups. This marker includes only five microsatellite regions and doesn’t exploit the whole variety of this phenomena.
Currently it is unknown whether distinct cancer groups exhibit distinct patterns of STR variation and in what CRC-related functional pathways STRs are involved.
STRs have been poorly studied until very recently due to their highly polymorphic nature which complicates their annotation. Moreover, genotyping and analyzing STR variants involves large amounts of data and working with many different biological data formats. We have developed a pipeline to systematically search for STR biomarkers in colorectal cancer that can be easily adapted for similar studies in other medical conditions. Our pipeline is the first building block towards a workflow for personalised medicine approach through STRs.
The statistical framework TRAL (Schaper 2015) have been used to find STRs in human reference genome. We share the logic behind creating a genotyping panel relevant for cancer research and how we arrived at the microsatellite panel we use today. These inferred STRs have been further genotyped on more than 400 genomes from patients with colorectal cancer available to us through the TCGA (The Cancer Genome Atlas Network 2012).
# Where?
The results are presented as a relational database with programmatic access through a REST-full API, - STR Explorer. Additional Python modules are available to add more genotyped data to the database .
STR Explorer is publically available at []. Comprehensive documentation of the API can be accessed on the main page. Fork the project on github.
To access repeats of a list of genes of your interest, try the following:
[screenshot of usage]
Next step is to combine STR annotations from STR Explorer with gene expression analysis and other clinically relevant data to perform a genome-wide association study (GWAS) on STRs in colorectal cancer as a case study. The inferred STR risk variants can later be validated as novel RNA and protein targets, which will serve as additional information for patient risk stratification and therapy-response prediction.